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Abstract 

We present a general form of the iteration and interpolation process used in implicit particle 
filters. Implicit niters are based on a pseudo- Gaussian representation of posterior densities, 
and are designed to focus the particle paths so as to reduce the number of particles needed in 
nonlinear data assimilation. Examples are given. 
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1 Introduction 

There are many problems in science in which the state of a system must be identified from an 
uncertain equation supplemented by a stream of noisy data (see e.g. [7]). A natural model of this 
situation consists of an Ito stochastic differential equation (SDE): 

dx = f(x,t) dt + g(x,t) dw, (1) 

where x — (xi, X2, ■ ■ ■ ,x m ) is an m-dimensional vector, w is m-dimensional Brownian motion, / is 
an m-dimensional vector function, and g(x,t) is an m by m diagonal matrix. The initial state x° is 
assumed given and may be random as well. 

As the solution of the SDE unfolds, it is observed, and the values b n of a measurement process 
are recorded at times t n ,n = 1, 2, ... For simplicity assume t n = nS, where 5 is a fixed time interval. 
The measurements are related to the evolving state x(t) by 

b n = h(x n ) + QW n , (2) 

where ft, is a /c-dimensional, generally nonlinear, vector function with k < m, Q is a k by k diagonal 
matrix, x n = x{nS), and W n is a vector whose components are k independent Gaussian variables of 
mean zero and variance one, independent also of the Brownian motion in equation (TT]). The task is 
to estimate x on the basis of equation {T]) and the observations 

If the system (J]) and equation ((2]) are linear and the data are Gaussian, the solution can be found 
via the Kalman-Bucy filter (see e.g. [3]). In the general case, it is natural to try to estimate x via its 
evolving probability density. The initial state x° is known and so is its probability density; all one 
has to do is evaluate sequentially the density P n +i of x n+1 given the probability densities of x k 
for k < n and the data b n+1 . This can be done by following "particles" (replicas of the system) whose 
empirical distribution approximates P n . A standard construction (see e.g pT3l [T2l f8l fTl fTTl [5l ITOl [9] ^ 
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uses the probability density function (pdf ) P n and equation |T]) to generate a prior density, and then 
uses the new data b n+1 to generate a posterior density P n +i through weighting and resampling. 
In addition, one has to sample backward to take into account the information each measurement 
provides about the past, as well as avoid having too many identical particles after resampling. This 
can be very expensive, in particular because the number of particles needed can grow catastrophically 
(see e.g. [UJ [5] and also Example 2 below). Sophisticated methods for generating efficient priors 
can be found e.g. in [HQ]. The challenge is to generate high probability samples so as to minimize 
the effort of computing particle paths whose weight is very low. 

In [5J we introduced an alternative to the standard approach. In our method the posterior den- 
sity is sampled directly by iteration and interpolation, as suggested by our earlier work on chainlcss 
sampling .4] , and by the observation in [15) connecting interpolation and the marginalization process 
used in chainless sampling. The new filter aims the particle trajectories as accurately as possible 
in the direction of the observations so that fewer particles are needed. In that earlier paper our 
approach was presented by means of simple examples. In the present paper we present a general, 
more abstract, formulation, introduce an extension to the case of sparse observations, and discuss 
additional examples. 

2 Forward step 

To begin, assume that at time t n — nS, where S > is fixed, we have a collection of M particles 
-X"") 1 < i < Af, n = 0, 1, . . . , whose empirical density approximates P nj the probability density at 
time nS of the particles that obey the evolution equation fTJ) subject to the observations @ at times 
t = kS for k < n. In the present section we explain how to find positions for the same particles 
at time (n + 1)<5 given only the positions at time nd and the pdf P n , taking into account the next 
observation and the equation of motion. Let N(a,v) denote a Gaussian variable of mean a and 
variance v. First, approximate the SDE |T]) by a difference scheme of the form 

x n+i = x n + F (x n ,t n )5 + G{X n ,t n )V n+1 , (3) 

where we assume temporarily that S equals the interval between observations, i.e., we assume that 
there is an observation at every time step. X n stands for X(n5), G is assumed to be diagonal, and 
X n ,X n+1 are m dimensional vectors. F, G determine the scheme used to solve the SDE, see for 
example [5J. V n+1 is a vector of N(0,S) Gaussian variables, independent of each other for each n, 
with the vectors V n+1 independent of each other for differing n, independent also of the W k , k = 1, 
in the observation equation ([2]). The sequence of X n , n = 0, 1, . . . approximates a sample solution 
of the SDE, X° is assumed given and may be random. The function G in ([3]) does not depend on 
X n+1 for an Ito equation, and we assume for simplicity that F does not depend on X n+1 either, 
because this was the case in all the examples we have worked on so far. The analysis below can be 
easily repeated for the case where F does depend on X n+1 , at the cost of slightly more complicated 
formulas. Equation © states that X n+1 - X n is an N(F{X n , t n )5, 8G(X n , t n )*G(X n , t™)) vector, 
where the star * denotes a transpose. 

We have one sample solution X" of the SDE for each particle. Our task is to sample, for each 
particle, the vector X™ +1 whose probability density is determined by the approximation of the SDE 
as well as by the next observation for each of the M particles. We keep the notation X™ +1 for the 
positions of the particles even though once the observation is taken into account these positions no 
longer coincide with the positions of sample solutions of equation ([3]). 

Consider the z-th particle. We are going to work particle by particle, so that the particle index i 
will be temporarily suppressed. Suppose we already know the posterior vector X n+1 . Its probability 
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density P n +i of X n+1 given X n is 

P n+1 (X n+1 ) = Z- 1 cxp (- (X n+1 - X n - F n )* {GIGn)- 1 {X n+1 - X n - F n ) /2 
- (h(X n+1 ) - b n+1 )* {Q*Qy 1 (h(X n+1 ) - b n+1 ) /2) , 

(4) 

where the functions F n = F(X n ,t n )5, and G n = VSG(X n ,t n ) can be read from the approximation 
of the SDE, and Z is a normalization constant, the integral of the numerator over all X n+1 with X n 
fixed. The value of this Z is not available. Our goal is to find samples X n+1 whose probability is 
high, and which are well distributed with respect to P n +i- We do that by picking the probability in 
advance: we first pick samples of m N(0, 1) variables (£1, ^2> • • • > £m) — whose joint pdf (probability 
density function) is exp(-£*£/2))/(27r) m / 2 , and require that each X n+1 be a function of a sample 
£ with the same probability as £, up to the Jacobian of the transformation. This should produce 
likely and well-distributed samples. 

A little thought shows that this can be done, not by equating P n+ \ to exp(— £*£/2)/(27r) m / 2 , 
but by equating the arguments of the two exponentials. For example, if one wants to represent a 
N(0, v) random variable x with pdf exp(— ^)/\/2itv as a function of a N(0, 1) variable £ with pdf 
exp(— £ 2 /2)/v / 2~7r, equating the arguments yields x — t/v(;, clearly a good choice. Thus, we wish to 
solve the equation 

re/2 = 

= (X n+1 - X n - F n )* {GlGnY 1 (X n+1 - X n - F n ) /2 + (h{X n+1 ) - b n+1 )* (Q*Q)' 1 (h{X n+1 ) - b n+1 ) /2 

(5) 

and obtain X n+1 as a function of £. 

We proceed point by point — given a vector £, we find the corresponding X n+1 rather than 
look for an expression for the function X™ +1 (£) as a whole — and by iteration: we find a sequence of 
approximations Xj l+1 (= Xj for brevity) which converges to X n+1 ; we set Xo = 0, and now explain 
how to find Xj + i given Xj. First, expand the function h in the observation equation ((2|) in Taylor 
series around Xf 

h(X j+1 ) - h{X 3 ) + Hj ■ (X j+1 - X 3 ), (6) 

where Hj is a Jacobian matrix evaluated at Xj . The observation equation ^ can be approximated 
as: 

Zj =HjXj +1 +QW n+1 , (7) 

where Zj = b n+1 - h(Xj) + HjXj. 

The left side of equation ([5]) can be approximated as: 

(X j+1 -X n - F n )* (G^GJ- 1 (X j+1 -X n - F n ) II + (HjX j+1 - z 3 )* (Q*Q) _1 (HjX j+1 - Zj ) /2 
= (X j+1 - fhjf S7 1 (X j+1 - fhj) /2 + (8) 

where 

EJ 1 = {GlGnY 1 + H*(Q*Q)- 1 H 3 , rh 3 = Zj ((G*G n ) _1 (X" + F n ) + H*(Q*Q)- 1 z 3 ) , 

and 

Kj = HjG* n G n H* + Q*Q, * 3 - = (z 3 - H 3 (X n + F n ))* Kj 1 (z 3 - H 3 (X n + F n )) /2. 
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We now solve for as a function of £. To make the computation tractable, in this step we 

ignore the remainder <3?j-; this is a key step. We thus solve the simpler equation 

(X j+1 - fh j yVJ 1 {X j+1 - %)/2 = re/2. (9) 

This can be done in any of a number of ways; for example, one can write Sj = LjL*, where Lj is a 
lower triangular matrix and L* is its transpose, and then set Xj+i — fhj +Lj£ (a different algorithm 
was suggested in [5]). The iteration is done. 

If the sequence Xj converges to a limit, call the limit X n+1 . One can readily check that the 
approximate equation Q converges to the full observation equation @. The remainders $j also 
converge to a limit <J>" +1 . Equation (O becomes: 

rc/2+$ n+1 = 

= (X n+1 - X n - F n )* (G I *G„)- 1 (X n+l - X n - F n ) /2 + (h(X n+1 ) - b n+1 )(Q*Q)- 1 (h(X n+1 ) - b n+1 )/2. 

(10) 

Multiply this equation by —1 and exponentiate both sides: 
exp(-f ?/2)exp(-$" +1 ) = 

= exp (- (X n+1 - X n - F n )* (G^Gn)- 1 (X n+1 - X n - F n ) /2 - (h(X n+1 ) - b n+1 )*{Q*Q)- 1 (h{X n+1 ) - b n+l ) /2\ . 

(11) 

This differs from what we set out to do in equation by the factor exp(— on the right hand 
side. 

Let P(a\f3) be the probability of a given (3. The factor exp(— $ ra+1 ) is proportional to P(b n+1 \X n ), 
and equation is the statement 

P(X n+1 \X n ,b n+1 )P(b n+1 \X n ) = P(X n+1 \X n )P(b n+1 \X n+1 ), (12) 

i.e., this is Bayes' theorem. Note also that equation © is a pseudo-Gaussian representation of X n+l , 
not a Gaussian representation; the matrix Sj is a function of the sample. 

We next compute the Jacobian determinant J = det(dX n+1 / dt;). This can be often done 
analytically. Equation ^ relates X n+1 to £ implicitly. We have values of £ and the corresponding 
values of X n+1 ; to find J there is no need to solve again for X n+1 ; an implicit differentiation is all 
that is needed. Alternately, J can be found numerically, by taking nearby values of £, redoing the 
iteration (which should converge in one step, because one can start from the known value of X n+1 ), 
and differencing. 

The expression on the right-hand side of equation (fTTj) is proportional to P(b n+1 1 X n+1 )P(X n+1 \X n ), 
with a proportionality constant independent of X n . When X n+1 is sampled as just described, each 
value of X n+1 = X n+1 (£) appears with probability . 1 , 2 exp(— £*£/2)/| J\, and then the value of 

this expression is exp(— £*£/2) exp(— To get the right value of the expression on the average, 
one has to give each proposed X n+1 the sampling weight W = ^ 2 J^ m/2 exp(— J|, (with another 
factor P(X n ) if such factors are not all equal). Since ^nWT^ ^ s a constant and the same to every 
particle, we will drop it from now on. Here we see an advantage of starting from a prechosen refer- 
ence variable £: the factor exp(— £*£/2), which varies from sample to sample, has been discounted in 
advance and does not contribute to the non-uniformity of the weights. We shall see that the other 
factors can be expected to vary little. 
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Do this for all the particles and obtain new positions with weig hts Wi = exp(-$" +1 )|J i |, where 
<&™ +1 , Ji are the values of these quantities for the i-th particle. One can get rid of the weights after 
the fact by resampling, i.e., for each of M random numbers 9k, k = 1, . . . , M drawn from the uniform 
distribution on [0, 1], choose a new A™ +1 = such that A" 1 Y^l\ Wj < 9 k < A- 1 £* =1 W 3 

(where A = J2jLi anc ^ tnen suppress the hat. 

Note also that the resampling does not have to be done at every step- for example, one can 
add up the phases for a given particle and resample only when the ratio of the largest cumulative 
weight exp(— YHfii ~ 1°§ 1^*1)) to the smallest such weight exceeds some limit L (the summation is 
over the weights accrued to a particular particle i since the last resampling) . If one is worried by too 
many particles being close to each other ("depletion" in the usual Bayesian terminology), one can 
divide the set of particles into subsets of small size and resample only inside those subsets, creating 
a greater diversity. As will be seen in the numerical results section, none of these strategies is used 
here and we resample fully at every step. 

The computational complexity of this construction depends on the sparseness of the matrix , 
which depends on the sparseness of Hj in the expression ([5]), which depends on the structure of the 
function h in equation @ . In the frequently encountered situation where h is diagonal, in the sense 
that each quantity measured is a function of a single component of the vector whose dynamics are 
given by equation (fT]), one finds that £j and Hj are diagonal, and the computations, including the 
computation of the Jacobian J, are easy, whether h is linear or not. The more arguments in each of 
the components of the function h, the more labor is required. 

If both equations ([T]) and ^ are linear and the initial data are Gaussian, then the pdfs P n 
are Gaussian. We only need to find the mean and the variance of the pdf, which can be found as 
above by considering a single particle; the iterations converge in one step. The resulting means and 
variances are identical to those produced by the Kalman filter. If one had needed multiple particles, 
their weights would have been all equal. If equation (fT]) is nonlinear but equation @ is linear (or 
can be well approximated by a linear function in each interval (nS, (n + 1)S)), then the P n +\ are in 
general not Gaussian and one needs multiple particles. The iterations still converge in one step, and 
what one obtains is a version of the forward step in a filter with an optimal importance function (as 
described e.g in [6]). 

The convergence of the iteration will be very briefly discussed further below. We have chosen 
the variables £ to be independent N(0, 1) variables, but there is nothing sacred about this choice. 
The goal is to pick samples whose probability is high, and in some contexts other choices may be 
better. We will discuss those other choices when they are made in further work. 

3 Backward sampling 

In the previous section we described how to sample the pdf at time (n + 1)5 given the pdf at 
time nS. In general, this is not sufficient. Every observation provides information not only about 
the future but also about the past- it may, for example, tag as improbable earlier states that had 
seemed probable before the observation was made. Furthermore, in non-Gaussian settings, the pdf 
one obtains by going directly from time (n — 1)6 to step {n 4- 1)5 by a step of duration 25 may be 
different from the pdf one obtains after two steps that include an intermediate step. After one has 
sampled at time (n+l)5, one has to go back, correct the past, and resample (this backward sampling 
is often misleadingly explained in the literature solely by the need to create greater diversity among 
the particles). We resample by interpolation, which we present explicitly for one backward step. It 
is quite obvious one can do that for as many backward steps as are needed. 

Given a set of particles at time (n+l)5, after a forward step and maybe a subsequent resampling, 
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one can figure out where each particle i was in the previous two steps, and have a partial history for 
each particle i: X™~ x , Xf , X™ +1 (if resamplings had occurred, some parts of that history may be 
shared among several current particles) . Knowing the first and the last members of this sequence, 
we recompute X n by interpolation, thus projecting information backward one step. 

The probability of the X ncw that will replace X n is the product of the three probabilities 
(properly normalized): the probability of the new leg from X 11 ^ 1 to X n , the probability of the 
resulting leg from X n to X n+1 (the end result being known), and the probability of the resulting 
observation at time n<5, i.e.: 

cxp (- (X ncw - X"- 1 - F n _ x )* (G; i _ 1 G n _ 1 )- 1 (X ncw - X"- 1 - F n _ x ) /2 

_ ( X n+i _ X ncw _ Fn y (g;g„)- 1 (X n+1 - X n - F n ) /2 - (h(X ncw ) - &")* (Q*Q)~ 1 (h(X ncw ) - b n ) /2^j . 

(13) 

Here we recall that F n _i = F{X n ~ 1 , t™" 1 ^ and G„_i = \/5G(X n - 1 ,t n ~ 1 ) are known from the 
approximation of the SDE, F n and G n are functions of X ncw , and the subscript % referring to 
the particle has been omitted. This expression differs from equation (j4]) by having an additional 
exponential factor. 

Once again, we set up an iteration, with iterates Xj, that converges to X now , and start with 
Xq = 0. We expand h(Xj + \) in a Taylor series around Xj, so that the last factor in the expression 
(I13[) becomes a quadratic in Xj+±. We complete squares so that the argument of the exponential in 
(fT3|) can be written as (Xj+i — rhj)T,J ((Xj+i — fhj)/2 + $j\ equate (-X^+i — rhj)Y<J 1 ((Xj + i — fhj)/2 
to ^*^/2, solve to get Xj + i as a function of ^, calculate the Jacobian, and find the weight. We do 
this for all the particles, and resample as needed. This concludes the backward sampling step. Note 
that as a result of the backward step and the subsequent forward step, P n +i depends, not only on 
the positions of the particles at time n8, but also on the earlier history of the system. 

4 Sparse observations 

Consider now a situation where we do not have observations at every time step. First, assume that 
one has observation at time (n + 1)8 but not at time nS. We try to sample X n and X n+1 together 
given the observation information at time step (n + 1)6. Consider the i-th particle. Suppose we are 
given the vector X™^ 1 for that particle. Suppress again the particle index i. The joint probability 
density P„,„+i of X n and X n+1 given X"- 1 is 

^n,n+l{A ) 

=Z~ l cxp (- (X n - X n - 1 - F n _i)* {G* n _ x G n - X y x (X n - X"- 1 - F„_ x ) /2 

- (X n+1 -X n - F n )* (G I * l G„)- 1 {X n+1 - X n - F n ) /2 - (h(X n+1 ) - b n+1 )* {Q*QY 1 (h(X n+1 ) - b n+1 ) /2) , 

(14) 

where Z is the normalization constant. We recall that F„_i = F(X n ~ 1 , < n_1 )(5, G„_ x = \f6G{X n ~ l ,t n ~ 1 ) 
are known from the approximation of the SDE, F n and G n depend on X n . 

In the now familiar sequence of steps, we pick two independent samples and £«.+!) each with 
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probability density exp(— £*£/2)/(27r) m / 2 , and try to solve the equation 

£n £ra/2 + £ n+1 £„ + i/2 

= (x n - x n - x - f„_i)* (g;_ 1 g„_ 1 )- 1 (X" - X"- 1 - F n _i) /2 

+ - X" - F n )* (G*G n ) _1 - X™ - F n ) /2 + (/i(X™ +1 ) - b n+1 )* (Q*Q)- 1 (h(X n+1 ) - b n+1 ) /2, 

(15) 

to obtain X n and X n+1 as functions of £ n and £ n +i- 

We define a sequence of approximations X™ and which converge to X n and re- 

spectively; set Xq — and Xq +1 — 0, and at each iteration find X" +1 and X^ x given X™ and 
X? + , First, expand the function h in the observation equation © in Taylor series around X? +1 : 

h(X?#) = h{X^ +1 ) + H? +1 ■ (X^ 1 - (16) 

where H™ +1 is a Jacobian matrix evaluated at X™ +1 . The observation equation © is approximated 
as: 

z n+l = H n+l X n+l + (17) 

where = - /ips^ 1 ) + H r / +1 X'/ +1 . 

Let F nj - = F(X?,t n )6 and G„,j = <j5G{Xf,t n ). The right side of equation (15]) can be 
approximated as: 

(A J+1 — A — r n -l) y^n-l^n-l) ~~ A —-Fn-l)/* 

+ ( X j+i ~ -^j+i _ ^n,i) (GljG„j) _1 (^j'+i 1 ~ ^i+l ~ ^n,j) /2 

+ (i/; +1 A';+ 1 - z r ; +i y (q*q)- 1 - /2. 

(18) 

We first combine the last two terms in (fT5)l and obtain 

- X? +1 - F nd )* (G* nd G n<j )-' (X$? - - F nj ) /2 + (H n+1 X^ - (G*G) _1 {H n+1 X^ - /2 

= (Xf£ - rh] +i y (SJ+ 1 )- 1 (Xj 1 ^ 1 - fh] +1 ) /2 + (19) 



where 



n+l v^ n +l 



(sn+i)-i = {G* nd G nd )- x + (h; i+1 )*(q*q)- 1 h^+\ 

({G* nd G ntj )-\X? +1 + F n>j ) + (Hj +1 )*(Q*Qy 1 Zj +1 ) , 



K n+i = //;-'(/; ;f ;„ ,//-: r + q*,^ 



and 



We combine the first term in (|18|) and the second term in (|19p and obtain 

- X"- 1 - F n _i)* (G^iG^i)- 1 (XjVi - X"- 1 - Fn-x) /2 + 

— V i+1 — — ^n-l) ['^n-l Lr n-l) ( A j+1 — A — -Fri-l) / * 

+ - H^{X^ +1 + F nJ )f (K-+ 1 )- 1 - + F nJ )) /2 

= (X]V 1 -m; l )*( S ")" 1 (^]Vi-S l )/ 2 + <i> " 5 (20) 
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where 

(S; 1 )- 1 = (G*_ 1 G„_i) _1 + {H- +1 y{K j3 n+1 )- 1 H'; + \ 
k? = i/; +1 G;_ 1 G„_ 1 (i/; +1 )* + kj +1 , 

and 

= - H? +1 (F nij + X"- 1 + F n _i))* (K;)- 1 - H^ +1 (F nj + X"- 1 + F n _i)) /2. 

Combining ([151), OS), ([19 ]) . and (pO jl . we try to solve 

£n*£n/2 + Ci+lCn+l/2 

= (X^ - m^Y (X%± - m? +1 ) /2 + (X? +1 - m«)* (EJ)" 1 - m") /2 + 4>». 

(21) 

We now solve for X" +1 and -Xj+i as functions of and ignoring the remainders i.e. we 

solve the simpler equations 

(X^ +1 -m J fc r(S))- 1 (X^ +1 -mf)/2 = ^ fc /2, fc = n,n+l (22) 

If the sequences X™ and converge to limits, call the limits X n and X n+1 . In the limit, the 

approximate equation (jTTJ) converges to the full observation equation The remainders <E>™ and 
also converge to limits and Equation (|15p becomes: 

= (X" - X"- 1 - F n _ x )* (G*.^^!)- 1 (X" - X"- 1 - F„_i) /2 

+ (X n+1 -X n - F n )* {GIGn)- 1 (X n+1 - X n - F n ) /2 + {h(X n+1 ) - &" +1 )(Q*Q)- 1 (/i(X" +1 ) - b n+1 )/2. 

(23) 

Multiply by —1 and exponentiate: 

exp(-CW2) exp(-£ +1 &,+i/2) exp(-<&") 
= exp ((X" - X"- 1 - F n ^)* (G;_ 1 G„_ 1 )- 1 (X" - X"- 1 - iVr) /2 

+ (X n+1 -X n - F n )* (G;G„)- 1 - X n - F n ) /2 + (h(X n+1 ) - b n+1 )*{Q*Q)- 1 (h(X n+1 ) - b n+1 ) /2) . 

(24) 

As before, one has to give each proposed X n and X n+1 the sampling weight W = exp(— $™)| J|, 
where J is the Jacobian J = det(0(X" , X n+1 ) / <9(£ n , Cn+i)) which must be computed. One does this 
for all particles and resamples as needed. This process can be generalized if one wishes to sample 
at more times between observations. One should also note that the procedure just described may 
make the evaluation of Jacobians significantly more onerous, but still often tractable. 

The construction of this paragraph is important because many data sets one tries to assimilate 
are indeed sparse, and also for the following reason. We have not provided in this present paper a 
discussion of the convergence of the iterations we use. This convergence depends on the structure of 
the underlying SDE, on the scheme used to approximate it, and on the specific ways one solves for the 
new increments in terms of the reference variables £, and cannot be analyzed without considering 
these specifics. In our previous paper [5] we analyzed a special case and found that there the 
convergence depended on the size of the time step. We conjecture that this happens frequently. The 
present section provides a way to decrease the time step as a device for repairing diverging iterations 
without much additional thought. 



5 Example 1 



We apply our filter to a prototypical marine ecosystem model studied in |10j . We set the main 
parameters equal to the ones in [lOj ; however, we will also present some results with a range of 
noise variances to make a particular point. We did the data assimilation with the filter described 
above, without back sampling, and also by the a standard particle filter SIR (Sampling importance 
resampling), see pQ. 

The model involves four state variables: phytoplankton P (microscopic plants), zooplankton Z 
(microscopic animals), nutrients N (dissolved inorganics), and detritus D (particulate organic non- 
living matter). At the initial time t = we have P(0) = 0.125, Z(0) = 0.00708, N(0) = 0.764, and 
-D(O) = 0.136. The system is described by the following nonlinear ordinary differential equations, 
explained in [TU] : 

dP N P 

ft = 0.18^Z-0.1Z + JV(0,oS) 

f - 0.m + 0.24^Z- 7 P^ + 0.05Z + A,(0,^) 
^ = -0.LD + OTP + 0.18 Q 1 P +p z + °- 05Z + N(0, a%), (25) 
where the parameter 7 , the " growth rate" , is determined by the equations given by 

7t = 0.14 + 3A 7t , A 7t = 0.9A 7t _i + N(0, o*). 

The variances of the noise terms are: 0% = (0.01P(0)) 2 , erf = (0.01Z(0)) 2 , cr 2 N = (O.OIN(O)) 2 , 
ajj = (0.01P(0)) 2 , and a 2 = (0.01) 2 . 

The observations were obtained from NASA's SeaWiFS satellite ocean color images. These 
observations provide a time series for phytoplankton; the relation between the observations P(£) bs 
(corresponding to the vector b n in the earlier discussion) and the solution P(t) of the equation of 
the first equation in (|25|) is assumed to be: 

logP(t) obs = logP(t) + N(0, a 2 bs ), 

where cr 2 ^ g = 0.3 2 . Note that this observation equation is not linear. There are 190 data points 
distributed from late 1997 to mid 2002. The sample intervals ranged from a week to a month or 
more, for details see [10]. As in [10], we discretize the system ([25]) by an Euler method with At = 1 
day and prohibit the state variables from dropping below 1 percent of their initial values. 

We have compared our filter and SIR in three sets of numerical experiments, all with the same 
initial values as listed above. In each case we attempted to find a trajectory of the system consistent 
with the fixed data, and observed how well we succeeded. In the first set of the experiments, we 
used 100 particles and take a 2 P = (O.OIP(O)) 2 as in [10], In this case, the (assumed) variance of 
the system is much smaller than the (assumed) variance of the observations; the particle paths are 
bunched close together, and the results from our filter and from SIR are quite close, see Figure 1, 
where we plotted the P component of the reconstructed solution as well as the corresponding data. 

In the second set of the experiments, we still used 100 particle but assumed cr 2 = (P(0)) 2 . 
The variance of the system is now comparable to the variance of the observation. For SIR, after 
resampling, the number of the distinct particles is smaller than in the first CM SC. clS cl result of the loss 
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Table 1: The number of distinct particles after resampling with different system variances and 
different numbers of particles 
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of diversity after resampling when the weights are very different from each other, see Table 1, where 
we exhibit the average number of distinct particles left after each rcsample; there is a resample after 
each step. Remember that there is some loss of diversity in resampling even if all the weights are 
equal. With 100 particles, the filtered results with SIR are still comparable to those with our filter. 
See Figure 2. 

In the third set of the experiments, we used only 10 particles and kept cr 2 — (P(0)) 2 . As one 
could have foreseen, our filter does better than SIR, see Figure 3. One should remember however 
that we are working with a low dimensional problem where the differences between filters are not 
expected to be very significant; the cost if 100 particles is not prohibitive. 



6 Example 2 

We consider next a simple high dimensional example, used in [14| to show how particle filters fail 
when the number of dimensions is large. We assume that each component of X n is an independent 
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Figure 2: Results with a 2 P = P(0) 2 and 100 particles 
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Gaussian with zero mean and unit variance. This is equivalent to taking 6 = 1, F(X n ,S) = 0, 
G(X n , t n ) = I in equation ([3]), and eliminating the X n term. We have 

X n = V n . 

Each component of X n is observed individually, so that 

b n = x n + w n 

We implement our filter with these particular choices. At the j-th iteration, Hj = I in equation ([6]) 
and z 3 = b n+1 in equation Q. Therefore, we have ST 1 = 21, % = b n+1 /2, and $ 3 = (6™+ 1 )*6™+ 1 /4, 
in equation ([5]). The iterations converge in one step and all the particles have the same weights. 

However, with SIR the weights are uneven. We ran the SIR filter 1000 times, with a 1000 
particles each time; in each run we normalized the weights so that add up to one, and we recorded 
the maximum weight. In Figures 4 we display a histogram of these recorded maximum weights. As 
one can observe, when the number of dimensions is large, most of time, a single particle in each run 
hogs all the probability, and this version of SIR fails. 



7 Conclusions 

We have presented a general form of the iteration and interpolation process used in our new implicit 
nonlinear particle filter. The goal is to aim particle paths sharply so that fewer are needed. We 
conjecture that there is no general way to reduce the variability of the weights in particle sampling 
further than we have. We also presented additional simple examples that illustrate the potential of 
this new sampling. These examples are simple in that one is low-dimensional, while the second is 
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Figure 4: Histogram of the SIR normalized maximum particle weights with 1000 runs for 100 
dimensions 
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linear so that other effective ways of sampling it do exist. High-dimensional nonlinear problems where 
our filter may be indispensable will be presented elsewhere, in the context of specific applications. 
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